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A  CANDIDATE  AUTOMATED  TEST  BATTERY  FOR 
NEUROPSYCHOLOGICAL  SCREENING  OF  AIRMEN: 
DESIGN  AND  PRELIMINARY  VALIDATION 


INTRODUCTION 

The  neurological  screening  tests  carried  out  routinely 
in  the  course  of  an  airman  physical  certification  exami¬ 
nation  are  designed  to  detect  a  broad  range  of  sensory- 
motor  abnormalities,  with  particular  emphasis  on  the 
cranial  nerves.  This  examination  may  or  may  not  be 
accompanied  by  a  relatively  informal  mental  status  ex¬ 
amination  exploring  psychiatric  and  cognitive  functions 
(Siassi,  1984).  However,  with  increasing  concerns  about 
the  need  to  assess  higher  mental  functions,  it  is  recog¬ 
nized  that  the  scope  and  sensitivity  of  neuropsychological 
aspects  of  the  examination  must  be  expanded.  For  in¬ 
stance,  a  panel  of  the  American  Medical  Association 
recently  convened  by  the  Federal  Aviation  Administra¬ 
tion  recommended  that  a  computerized  test  of  cognitive 
function  be  developed  “...  that  would  detect  significant 
cognitive  impairments  that  may  otherwise  go  unrecog¬ 
nized  during  a  routine  physical  examination.”  (AMA, 
1984). 

The  problem  is  that  current  neuropsychological  screcn- 
ing  and  mental  status  examinations  were  designed  to 
detect  symptoms  of  relatively  severe  sensory,  motor,  or 
cognitive  pathology.  While  the  tests  appear  relatively 
good  in  detecting  such  severe  organic  illness,  ranging 
from  60  to  70  per  cent  accuracy  (Webster,  Scott,  Nunn, 
McNecr,  and  Varnell,  1984),  they  were  not  intended  to 
be  so  sensitive  that  they  could  be  used  as  early  indicators 
of  disturbances  of  higher-level  mental  function.  The 
“cognitive  function”  portion  of  the  traditional  mental 
status  examination  is  typically  limited  to  observing  the 
patient’s  orientation  for  time  and  place,  knowledge  of 
birthdate  and  age,  and  some  historical  or  geographical 
reference,  such  as  the  name  of  the  current  President  or 
the  location  of  the  test  (Siassi,  1984).  Although  this  may 
be  supplemented  by  observing  the  patient’s  form  of 
thinking  (logical  or  illogical),  and  ability  to  abstract,  it  is 
easy  to  sec  that  this  examination  docs  not  challenge 
higher  mental  functions.  In  fact,  it  has  undergone  little 
change  from  the  time  it  was  introduced  by  Adolf  Meyer 
in  1902,  despite  significant  theoretical  advances  in  the 
field  of  cognitive  psychology  (Gardner,  1987). 

In  response  to  this  need,  a  computerized  test  battery 
based  on  current  cognitive  theory,  has  been  developed 
that  provides  a  bricfscrccning  fordisturbances  in  higher- 
level  cognitive  function.  Phis  report  describes  the  back¬ 
ground  and  composition  of  this  test,  and  the  results  of 
initial  validation  and  sensitivity  studies. 


Background 

Many  attempts  to  make  the  traditional  mental  status 
examination  more  objective  have  been  carried  out.  A 
good  sampling  of  these  has  been  described  by  Nelson, 
Fogel,  and  Faust  (1986).  One,  the  Mini-Mental  Status 
Examination  (MMSE)  (Folstein,  Folstein,  and  McHugh, 
1975)  is  of particular  interest  since  the  AMA  Committee 
referred  to  above,  after  considering  existing  test  proce¬ 
dures,  recommended  to  the  FAA  that  the  MMSE  be  used 
in  the  routine  cognitive  screening  of  candidate  airmen 
until  a  more  definitive  test  battery  could  be  developed 
(AMA,  1984).  The  MMSE  is  a  general-purpose  cogni¬ 
tive  screening  test  consisting  of  1 1  items  and  requiring  5 
to  10  minutes  to  administer.  The  tests  measure  orienta¬ 
tion  to  time  and  place,  registration  (naming  3  objects 
and  remembering  them),  attention  and  calculation  (se¬ 
rial-seven  subtraction),  recall  (remembering  the  3  ob¬ 
jects  repeated  above),  language  (naming,  repeating,  and 
following  commands),  motor  behavior  (eye  closing), 
sentence  production,  and  copy  design.  The  patient’s 
level  ofconsciousness  is  also  subjectively  evaluated  along 
a  continuum  from  alert  to  coma. 

Nelson,  et  al.,  (1986)  report  the  results  of35  publica¬ 
tions  dealing  with  the  MMSE.  Five  tests  of  reliability 
were  reported  revealing  a  range  from  .83  to  .99  in 
psychiatric,  neurological  and  mixed  patients.  A  total  of 
11  validation  studies  covering  859  subjects  has  been 
carried  out,  with  the  percentage  of  correct  classification 
ranging  widely,  depending  on  the  pathology  involved. 
For  instance,  non-psychotic  psychiatric  inpatients  with¬ 
out  diagnosed  organic  mental  disorders,  and  patients 
with  focal  right  hemisphere  lesions  almost  all  pass  this 
test,  while  patients  suffering  from  depression  with  cog¬ 
nitive  impairment  usually  fail.  Anthony,  LeReschc,  N iaz, 
von  Korff,  and  Folstein  (1982)  report  an  overall  false 
positive  rate  of  39  percent  and  a  false  negative  rate  of  5 
percent.  The  correlation  between  the  MMSE  and  the 
Wechsler  Adult  Intelligence  Scale  is  reasonably  high 
(between  .55  and  .78  for  the  verbal  portion,  and  between 
.55  and  .66  for  the  performance  portion)  (Dick,  Guiioff, 
Stewart,  Blackstock,  Biciawska,  Paul,  and  Marsdcn, 
1984).  Subsequent  to  the  AMA  recommendation,  the 
MMSE  was  validated  in  three  studies  with  respect  to  its 
ability  to  discriminate  between  civil  pilots  and  neuro¬ 
logical  or  psychiatric  patients.  In  all  three,  results  were 
extremely  disappointing,  with  false  negative  rates  as  high, 
as  96  percent  (IxRoux,  1988). 
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Foundations  of  the  new  test  battery. 

The  test  battery  proposed  here  evolved  from  a  rela¬ 
tively  new  theoretical  orientation,  and  utilizes  a  “step” 
procedure  to  minimize  testing  time.  These  approaches 
are  described  briefly  below. 

While  traditional  neuropsychological  tests  have  uti¬ 
lized  theoretical  clinical  or  empirical  approaches  to  test 
construction,  the  present  battery  was  developed  with  a 
specific  theoretical  position  in  mind.  Most  recent  for¬ 
mulations  of  the  nature  of  human  cognition  postulate 
that  it  is  multi-dimensional,  i.e.,  separate  processing 
mechanisms  exist  for  general  categories  of  cognitive 
function.  This  multi-processor  hypothesis  proposes  that 
two  activities  can  be  conducted  without  mutual  impair¬ 
ment,  as  long  as  each  one  utilizes  a  different  information 
processing  structure  (Allport,  1 980:  see  Colley  and  Beech, 
1989  for  a  review).  Based  on  this  kind  of  data,  Wickens 
( 1 984)  proposed  a  “multiple-resources”  theory,  suggest¬ 
ing  that  there  are  at  least  three  kinds  of  resources,  each 
varying  in  two  dimensions.  The  first  resource  involves 
sense  modality,  and  primarily  involves  visual  versus 
auditory  processing.  The  second  resource  divides  the 
above  into  early  and  late  stages  of  processing.  The  third 
resource  addresses  the  type  of  central  processing  carried 
out  (spatial  or  verbal). 

!n  the  present  test  battery  development,  the  multiple 
resources  theory  provided  a  basic  starting  point.  The  goal 
was  t<  use  specifically  targeted  tests  to  sample  as  many  of 
the  resources  postulated  by  Wickens  as  possible.  To  this 
end,  spatial  and  verbal  functions  requiring  various 
memory  demands  and  processing  sequences  were  in¬ 
cluded.  In  addition,  tests  of  psychomotor  control  and 
some  of  the  best  traditional  cl  inical  tests  were  included  to 
provide  the  broad-based  screening  desired  for  the  first 
tests  in  the  battery. 

The  second  relatively  new  characteristic  of  the  present 
test  battery  was  incorporated  to  answer  the  need  for  a 
briefscrecning  test  which  was  also  of  some  diagnostic  use 
to  the  clinician.  The  “step”  approach,  as  described  by 
Russell  (1984),  was  adopted  for  this  purpose.  In  this,  the 
battery  is  organized  into  steps.  If  the  person  fails  tests  in 
the  first  step  of  the  battery,  the  next  set  of  tests  is 
administered  to  verify  and  elaborate  the  indications  of 
the  first  step.  Thus,  each  step  acts  as  a  screening  battery 
for  the  next,  more  detailed  step. 

Versions  of  such  a  step  approach  to  testing  appear  to 
be  gaining  ir.  popularity.  Tarter  and  Fdwards  (1 986),  for 
example,  suggest  that  brief  screening  tests  be  used  to 
explore  “core  elements  ofa  neuropsychological  examina¬ 
tion”  (attention,  memory,  perception,  language,  visual- 
spatial,  and  psychomotor  processes).  If  indicated  by 


results  on  these  tests,  a  second  stage  would  give  a  stan¬ 
dard  “subbattery”  to  specific  individuals.  Problems  at 
this  second  level  would  signal  the  need  for  highly  indi¬ 
vidualized  testing.  This  approach  has  been  incorporated 
into  the  1.5  hour  Pittsburgh  Initial  Neuropsychological 
Test  System  (PINTS)  (Goldstein,  Tarter,  Shelly,  and 
Hedgedus,  1983).  However,  as  used  to  date,  the  step 
approach  has  not  been  truly  incorporated  into  a  com¬ 
puter-driven  brief  neuropsychological  screening  battery 
of  the  type  required  in  the  present  effort. 

Brief  description  of  the  Neuropsychological 
Test  Battery  (NTB). 

Recognizing  that  the  development  of  a  test  battery  is 
an  iterative  process,  a  preliminary  set  of  candidate  tests 
was  selected.  These  tests  were  implemented  in  a  “bread¬ 
board”  fashion,  using  paper-and-pencil  tests  and  a  Com¬ 
modore  computer.  The  first  validation  experiment 
utilized  this  version  (LeRoux,1988).  Based  on  the  results 
of  this  experiment,  a  revised  first  candidate  version  (1 .0) 
was  created,  consisting  of  a  different  combination  of  the 
original  tests.  This  version  was  administered  to  121 
subjects  and,  based  on  these  analyses,  a  second-genera¬ 
tion  breadboard  version  (1.1)  and  a  fully-computerized 
version  (2.0)  were  developed  and  subjected  to  prelimi¬ 
nary  experimental  validation.  In  this  section,  the  tests 
comprising  these  various  versions  are  described. 

The  original  list  of  candidate  tests  considered  in  the 
preliminary  version  consisted  of  the  following: 

1.  Trail  Making  Test.  This  is  a  test  of  “visual- 
conceptual  and  visuomotor  tracking”  (Lezak,  1983). 
In  the  first  part  of  the  test  (Trails  A)  25  numbered 
circles  are  to  be  joined  in  sequence.  In  the  second  part 
(Trails  B)  the  25  circles  are  numbered  1  to  13  and  A 
to  L,  and  they  are  to  be  joined  in  an  alternating 
sequence.  The  test  has  consistently  proven  to  be  one 
of  the  best  general  screening  instruments  for  diffuse 
brain  injury  (Spreen  and  Benton,  1965).  In  addition, 
it  has  been  shown  to  be  decremented  in  chronic 
alcoholics,  in  certain  neurological  conditions,  and  in 
psychiatric  conditions  (Lezak,  1983). 

2.  Symbol  DigitSubstitutionTest.Thismodifica- 
tior,  of  the  digit-symbol  subtest  from  the  Wcchsler 
intelligence  scales  is  based  on  the  work  of  Smith 
(1968).  It  requires  the  subject  to  substitute  numbers 
for  geometric  symbols.  It  appears  to  require  visual 
perceptual,  visual  scanning,  and  attention  allocation 
resources.  It  is  reported  to  be  more  consistently  sen¬ 
sitive  to  brain  damage  than  any  other  Wcchsler  Adult 
Intelligence  Scale  subtest,  and  to  show  decrement 
even  when  damage  is  minimal. 
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3.  Color-Word  Test.  This  test,  modified  from  the 
original  StroopTest  (Stroop,  1935)  requires  the  sub¬ 
ject  to  name  the  color  in  which  a  word  is  written,  even 
though  the  word  may  be  the  name  of  that  color,  or  of 
a  different  color.  It  is  a  measure  of  the  speed  with 
which  a  person  can  inhibit  an  overlearned  perceptual 
set  (reading  the  word)  and  conform  to  changing 
demands.  As  such,  it  appears  to  tap  several  of  the 
central  processing  and  response  organization  resources 
of  the  multiple-resources  model. 

4.  Unstable  Tracking  Test.  In  this  test,  the  subject 
must  keep  a  computer-generated  “target”  centered 
with  a  tracking  knob  (version  1.1)  or  a  joystick 
(version  2.0),  while  the  computer  generates  offsets  for 
the  target.  This  test  has  considerable  content  validity 
as  a  sensitive  visual-motor  coordination  test.  In  pre¬ 
tests  of  the  current  battery  (Leroux,  1988)  this  test 
proved  to  be  one  of  the  best  general  discriminators 
between  normal  and  pathological  groups. 

5.  Continuous  Performance  Test.  This  Dynamic 
Memory  Test  is  modeled  after  procedures  described 
by  Moore  and  Ross  (1963)  and  Hunter  (1975).  The 
basic  design  for  the  present  version  of  the  test  was 
developed  by  Shingledecker  (1984)  as  part  of  the 
Criterion  Task  Set  for  the  U.  S.  Air  Force.  The  test 
requires  the  subject  to  note  the  bottom  number  of  a 
fraction.  When  a  new  fraction  appears,  the  subject 
must  respond  by  saying  whether  the  top  number  is  the 
same  as  the  previous  bottom  number.  However,  the 
new  bottom  number  must  first  be  noted,  because  as 
soon  as  a  response  is  given,  the  original  fraction  is 
replaced  by  a  new  one.  Again,  elements  of  numerical 
centra]  processing  and  response  inhibition  are  probed 
by  this  procedure. 

6.  Verbal  Thinking  Test.  This  test  is  based  on  the 
paradigm  developed  by  Posner  (1978),  and  involves 
having  the  subject  classify  two  letters  of  the  alphabet 
by  each  of  two  ruies.  One  rule  involves  physical 
identity  alone  (whether  both  are  the  same  letter  in  the 
same  case).  The  other  involves  a  semantic  rule  (whether 
both  are  vowels  or  consonants).  The  test  places  high 
demands  on  semantic  memory,  and  on  rule-based 
behavior. 

7.  Arithmetic  Test  This  is  a  simple  test  ofability  to 
carry  out  several  addition  and  subtraction  functions 
rapidly.  It  has  been  adapted  from  the  Unified  Tri- 
Services  Cognitive  Performance  Assessment  Battery 
(UTCPAB)  (Perez,  Masline,  Ramsey,  and  Urban, 
1987),  and  appears  to  probe  specific  numerical,  logi¬ 
cal,  and  attention  allocation  functions. 


8.  Interval  Production  Test  This  test  is  based  on 
the  work  of  Michon  (1966),  and  requires  the  subject 
to  tap  at  a  regular  rate  of  two  to  three  per  second  for 
three  minutes.  Interest  is  in  the  variability  of  the 
tapping.  It  appears  that  the  test  may  measure  psy¬ 
chomotor  stability  (possibly  involving  the  reticular- 
cerebellar  axis),  and  should  be  sensitive  to  disruptions 
due  to  either  organic  or  functional  problems. 

9.  Spatial  Thinking  Test.  This  test  dates  from  an 
original  concept  described  by  Fitts  (1956)  and  is 
modified  by  Shingledecker  (1984).  A  four-bar  histo¬ 
gram  is  presented.  After  3  seconds  it  is  removed  and 
replaced  (after  a  delay)  with  another  histogram  ro¬ 
tated  either  90  or  270  degrees.  The  subject  must 
decided  whether  the  second  histogram  is  the  same  as 
the  first.  Intact  spatial  memory  is  required,  as  well  as 
ability  to  mentally  manipulate  spatial  symbols. 

1 0.  Short-term  Memory/Retrievai  Test  The  para¬ 
digm  proposed  by  Sternberg  (1969)  is  used  to  probe 
short-term  memory  retrieval  processes  (including  sen- 
sory/perceptual  and  motor  functions).  This  test  in¬ 
volves  determining  whether  a  “probe”  letter  of  the 
alphabet  is  a  member  of  a  previously  memorized 
target  set.  Short-term  retrieval  processes  are  required 
by  this  test. 

1 1 .  Visual  Monitoring.  This  test  requires  the  sub¬ 
ject  to  monitor  four  dials  (similar  to  aircraft  dials)  to 
detect  a  randomly-occurring  bias  in  one  of  them. 

1 2.  Logical  Reasc  ning  Test.  This  is  a  version  of  the 
logical  reasoning  test  proposed  by  Baddeley  (Baddeley 
and  Liberman,  1980).  A  series  of  symbols  are  pre¬ 
sented,  along  with  a  verbal  description  of  the  logical 
relationships  between  them.  The  subject  must  deter¬ 
mine  whether  the  logical  relations  described  are  true 
or  not  with  respect  to  the  presented  symbols.  The  test 
assesses  a  broad  range  of  higher  level  cognitive  func¬ 
tions. 

13.  Zung  Self-Rating  Depression  Scale  (Zung, 
1 965).  This  is  a  20-item  scale  which  yields  an  overall 
depression  index,  as  well  as  sub-scorcs  on  affect, 
physiological  disturbance,  and  psychomotor  distur¬ 
bance. 

14.  Manifest  Anxiety  Scale.  This  is  a  28  item  self- 
report  scale  designed  to  detect  symptoms  of  “anxious¬ 
ness,”  primarily  as  manifested  in  autonomic  activity. 
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1 5.  Shipley-Hartford  Retreat  Scale.  This  two-part 
test  consists  of  a  vocabulary  section  and  an  abstract 
reasoning  section.  The  vocabulary  test  was  used  in 
early  testing  only  to  establish  that  the  subject  is 
functioning  at  an  acceptable  intellectual  level. 

Software. 

Versions  1.0  and  1.1  (the  breadboard  versions)  were 
implemented  on  Commodore  computers.  Version  2.0 
was  created  in  QuickBasic,  and  is  compatible  with  IBM 
XT  or  higher  computers.  QuickBasic  is  a  common 
language  which  is  familiar  to  most  computer  users.  Thus, 
it  is  easily  modifiable.  The  program  automatically  pre¬ 
sents  tests,  evaluates  subject  performance  at  each  level, 
decides  whether  to  present  subsequent  levels  of  tests,  and 
prints  to  the  screen  a  code  that  tells  the  examiner  the 
results  of  the  examination.  All  results,  of  course,  are 
saved.  Details  of  the  computer  program  are  described  in 
more  detail  in  Moise,  O’Donnell  and  Hordinsky  (in 
preparation). 

Hardware. 

The  battery  is  configured  to  run  on  an  IBM  XT,  AT, 
or  true  clone  with  51 2K  memory,  one  360K  floppy  disk, 
cither  an  EGA  graphics  card  or  a  Sigma  Designs  Color 
400+  graphics  card,  and  a  color  monitor  that  supports 
the  selected  color  graphics  card.  This  configuration  is  a 
reasonably  “standard”  PC  system  ofthe  type  which  exists 
in  many  physicians’  offices. 

Preliminary  validation  studies. 

The  long  and  demanding  process  of  criterion  and 
predictive  validation  will  clearly  take  several  years  to 
complete.  However,  as  an  initial  attempt,  three  valida¬ 
tion  studies  involving  242  subjects  have  been  carried  out 
to  provide  the  initial  assessment  of  the  proposed  battery, 
as  well  as  to  provide  a  model  for  subsequent  validation 
studies.  A  preliminary  study  used  121  subjects  to  assess 
the  Mini-Mental  Status  Exam  of  Folstcin,  Folstcin,  and 
McHugh  (1975),  and  to  provide  basic  data  on  the 
preliminary  set  of  performance  tests.  I  his  study,  re¬ 
ported  in  LcRoux,  (1988),  provided  the  experimental 
basis  for  selection  of  the  initial  tests  in  version  1 .0  ofthe 
NTB.  The  next  two  studies  provided  important  clues 
with  regard  to  subsequent  modifications  of  the  battery, 
and  are  described  in  the  present  report. 

Materials  and  Methods. 

Subjects.  Atotal  of  1 21  subjects  were  tested  in  the  first 
study  reported  here,  with  81  individuals  in  the  “non¬ 
pathology"  group  (no  history  of  psychiatric  or  neuro¬ 
logical  pathology)  and  40  subjects  in  the  “pathology” 
groups.  In  the  non-pathology  group,  41  of  the  subjects 
were  active  pilots,  and  40  were  non-pilots  fulfilling  the 


group  criteria.  Twenty  of  the  subjects  in  the  overall 
cohort  of  the  first  study  were  used  again  in  a  second 
study,  as  explained  later. 

The  non-pathology  subjects  all  agreed  to  participate 
without  compensation.  The  pathology  subjects  were 
recruited  through  the  local  Veterans  Administration 
Hospital  (VAH)  Center.  These  VA  subjects  were  paid  at 
the  rate  of  $5.00  per  hour  of  participation.  No  attempt 
was  made  to  control  or  interfere  with  the  normal  medi¬ 
cation  for  any  subject.  Patients  were,  for  the  most  part, 
well-controlled  on  their  present  medication.  They  thus 
represented  a  clinical  population  currently  displaying 
only  marginal  symptoms. 

The  pathological  groups,  and  the  number  of  subjects 
in  each  sub-category  finally  included  in  the  test  sample 
for  the  first  study,  are  described  below: 

1.  Substance  abuse.  This  group  included  22  sub¬ 
jects  currently  being  treated  for  alcoholism,  and/or 
drug  dependency.  All  of  these  subjects  were  more  than 
90  days  post-detoxification,  by  clinical  record. 

2.  Seizures.  Included  in  this  group  were  eight  in¬ 
dividuals  currently  being  treated  in  a  hospital  neurol¬ 
ogy  department,  all  of  whom  had  been  diagnosed  as 
having  seizure  disorders  from  various  causes. 

3.  Depressives.  Included  in  this  group  were  hospi¬ 
tal  inpatients  (2)  and  outpatients  (8)  who  carried  a 
primary  diagnosis  involving  depression,  and  who 
were  currently  being  treated  by  psychotherapy  and/or 
medication  for  that  condition. 

For  the  second  experiment  described  below,  20  sub¬ 
jects  who  had  been  evaluated  with  versions  1.0  and  1.1 
were  retested  with  the  version  2.0.  This  included  5 
subjects  from  the  pilot  group  and  5  from  the  non-pilot- 
normal  group,  in  addition  to  10  subjects  from  the 
pathological  groups.  These  latter  subjects  consisted  of  4 
from  thesubstanceabuse category,  3  from  the  neurologi¬ 
cal  category,  and  3  from  the  depression  category. 

Procedures  for  first  study. 

Based  on  the  results  of  the  preliminary  study,  candi¬ 
date  tests  for  each  of  three  levels  of  the  battery  were 
selected,  and  this  was  designated  as  version  1.0  of  the 
NTB.  The  tests  selected  are  shown  in  Table  I. 

All  subjects  were  given  all  of  the  Level  1  tests.  After 
they  were  finished,  their  scores  were  inspected  and 
compared  to  pre-established  “pass-fail”  criteria  on  each 
test.  These  preliminary  criteria  were  deliberately  set  to  be 
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TABLE  I.  VERSION  1.0  OF  THE  NTB 


LEVEL  1 

TRAILS  A 
TRAILS  B 
SYMBOL  DIGIT 
COLOR  WORD 
UNSTABLE  TRACKING 


LEVEL  2 

CONT.  PERFORMANCE 
VERBAL  THINKING 
ARITHMETIC 
INTERVAL  PRODUCTION 


LEVEL  3 

SPATIAL  THINKING 
MEMORY  TEST 
VISUAL  MONITOR 
LOGICAL  REASON. 
ZUNG  DEPRESSION 
MANIFEST  ANXIETY 
SHIPLEY  SCALE 


harder  to  “pass”  than  was  expected  for  the  final  criterion 
measures.  In  this  way,  it  was  assured  that  all  subjects  who 
would  eventually  be  failed  by  the  final  (less  rigorous) 
criteria  would  also  fail  in  this  preliminary  screening.  In 
any  case,  if  the  subject  failed  any  test  at  Level  1 ,  all  of  the 
tests  at  L^vel  2  were  administered.  The  same  logic  as 
above  was  used  to  establish  Level  2  “pass-fail”  criteria, 
and  ifthesubject  failed  any  test  at  Level  2,  all  Level  3  tests 
were  administered. 

All  computer-generated  tests  in  versions  1.0  and  1.1 
were  presented  on  a  Commodore  SX-64  computer, 
using  a  12-inch  Commodore  color  monitor.  Subjects 
received  immediate  feedback  after  each  test.  In  addition 
to  the  computer-generated  tests,  several  commercial 
paper-and-pcncil  tests  were  administered  in  these  ver¬ 
sions.  These  were  the  Trails  'Lest  (forms  A  and  B),  the 
symbol-digit  test,  and  the  Shipley  Scale.  These  tests  were 
given  in  their  standard  commercial  forms,  using  the 
directions  and  norms  provided  by  the  test  authors. 

Procedures  for  the  second  study. 

From  the  entire  group  of  121  subjects  who  had 
participated  in  the  first  experiment,  45  randomly  se¬ 
lected  subjects  were  contacted  by  the  experimenter,  and 
requested  to  participate  again.  The  first  20  to  accept  in 
the  appropriate  categories  were  used  as  the  subjects. 
Except  for  the  testing  sequence  and  the  completely 
computerized  administration,  all  procedures  were  iden¬ 
tical  to  the  original  test  administration.  As  in  the  first 
study,  subjects  in  the  pathological  groups  were  paid  for 
participation,  while  “normal”  subjects  received  no  com¬ 
pensation.  Every  attempt  was  made  to  maintain  the  same 
motivation  level  as  in  the  first  study,  and  it  was  felt  that 
conditions  between  the  two  test  administrations  were  as 
identical  as  possible. 


RESULTS  AND  DISCUSSION 

Results  of  the  first  study. 

A  total  of62  individual  measures  (e.g.,  reaction  times, 
percent  correct  scores,  and  standard  deviations)  were 
generated  by  the  candidate  battery.  Summary  data  for 
each  subject  were  analyzed  in  several  ways.  One-way 
analyses  ofvariance  (ANOVA  -independent  groups  with 
unequal  N)  were  performed  on  each  of  the  dependent 
variables,  based  on  group  membership  in  any  of  the 
experimental  groups.  It  is  recognized  that,  with  the  large 
number  of  analyses  thus  carried  out,  a  given  alpha  level 
is  not  protected,  and  therefore  individual  significances 
revealed  in  these  analyses  may  not  be  precise,  although  a 
protection  factor  was  used.  Meaningful  trends  should, 
however,  be  revealed. 

Age.  There  were  proportionately  more  older  subjects 
represented  in  the  pathological  groups,  especially  in  the 
age  ranges  over  45  years.  Mean  age  for  the  pathological 
groups  was  48.1 5  years,  as  compared  to  40.27  years  for 
the  non-pathological  groups  (pc. 001).  Closer  inspec¬ 
tion  of  the  data,  however,  indicated  that  the  age  factor 
might  not  be  as  important  as  it  first  appeared.  The  major 
difference  in  age  among  the  groups  was  between  the  non- 
pilot  norma!  group  and  all  others.  In  fact,  the  non-pilot 
group’s  average  age  was  34.8  years,  whereas  each  of  the 
other  groups  averaged  between  44.3  (neurologicals)  and 
53-8  years  (depressives),  with  a  mean  of 47. 09  years.  This 
compared  to  a  mean  age  of  45.6  years  for  the  pilot  group. 
Thus,  the  pathological  groups  were  not  different  from 
each  other,  or  from  the  pilot  subjects.  Nevertheless,  age 
was  included  as  a  variable  in  all  subsequent  analyses 
reported  here.  In  addition,  an  analysis  of  covariance  was 
performed  between  pathological  and  non-pathological 
groups  on  all  of  the  dependent  variables  reported  later, 
using  age  as  the  covariate.  In  no  case  was  the  basic 
statistical  significance  of  any  result  changed  (although, 
of  course,  significance  levels  were  reduced  somewhat). 
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TABLE  II.  NTB  VARIABLES  AND  THEIR  DISCRIMINATION  LEVELS  FOR  THE 
EXPERIMENTAL  GROUPS 


TEST  VARIABLE 

p-VALUE 

SYMBOL  DIGIT  SCORE 

<.0001 

%  CORR 

.332 

TRAILS  A  TIME 

<.0001 

TRAILS  B  TIME 

<.0001 

STERNBERG  MEMORY  RETRIEVAL 

RT  -  SET  1 

.02 

RT  -  SET  2 

.03 

RT  -  SET  4 

.009 

SD  -  SET  1 

.02 

SD  -  SET  2 

.036 

SD  -  SET  4 

.004 

%  CORR  -  SET  1 

.091 

%  CORR  -  SET  2 

.202 

%  CORR  -  SET  4 

.765 

RT  -  TOTAL 

.020 

%  CORR  -  TOTAL 

.227 

SLOPE 

.038 

INTERCEPT 

.029 

DYNAMIC  MEMORY  (CONT.  PERFORMANCE) 

RT 

.002 

SD 

.013 

%  CORRECT 

.005 

VERBAL  THINKING  TEST 

RT  -  PHYSICAL 

.009 

SD  -  PHYSICAL 

.007 

%  CORR  -  PHYSICAL 

.0001 

RT  -  CATEGORY 

.005 

SD  -  CATEGORY 

.016 

%  CORR  -  CATEGORY 

.01 

PHYS.  -  CAT.  DIFFERENCE 

.036 

TOTAL  %  CORRECT 

.001 

LOGICAL  REASONING 

RT 

.04 

SD 

.060 

%  CORRECT 

.009 

SPATIAL  PROCESSING 

RT 

.213 

SD 

.749 

%  CORR 

.108 

STROOP  COLOR  WORD 

RT- CONFLICT 

.0003 

SD  -  CONFLICT 

.0006 

%  CORR  -  CONFLICT 

.244 

RT  -  NON  CONFLICT 

<.0001 

SD  -  NON  CONFLICT 

.0003 

%  CORR  -  NON  CONFLICT  .536 

CON-NON  CONFLICT  RT 

.083 

MEAN  %  CORR 

.082 
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TABLE  II  (Continued).  NTB  VARIABLES  AND  THEIR  DISCRIMINATION  LEVELS  FOR  THE 
EXPERIMENTAL  GROUPS 


TEST  VARIABLE 

p-VALUE 

VISUAL  MONITORING 

RT 

.055 

SD 

.051 

HITS  AFTER  TIMEOUTS 

.365 

FALSE  ALARMS 

.937 

TOTAL  HITS 

.003 

MISSES 

.003 

%  CORRECT 

.003 

INTERVAL  PRODUCTION  TEST 

DURATION 

.05 

SD 

.007 

IPT 

.087 

UNSTABLE  TRACKING 

ERROR  SCORE 

<.0001 

EDGE  VIOLATIONS 

<.0001 

ARITHMETIC 

RT 

.0.3 

SD 

.103 

%  CORR 

.609 

NUMBER  ATTEMPTED 

.006 

NUMBER  CORRECT 

.003 

ZUNG  DEPRESSION  SCALE 

SCORE 

<.0001 

MANIFEST  ANXIETY  SCALE 

SCORE 

<.0001 

SHIPLEY  SCALE 

SCORE 

.021 

Therefore,  although  age  must  be  considered  as  a  mod¬ 
erator  in  any  future  analysis,  it  does  not  appear  to  be  the 
major  determinant  of  the  results  to  be  presented  below. 

Sex.  Similarly,  there  were  very  few  female  subjects 
available  in  the  selected  populations  of  civil  pilots,  and 
none  for  the  VA  patients.  However,  it  was  possible  to 
look  at  sex  differences  in  performance  within  the  two 
"normal”  groups.  Analyses  of  variance  were  performed 
on  all  variables  between  the  eight  female  subjects  and 
the  73  male  subjects.  I  hese  revealed  only  3  of  the  62 
variables  significant  at  an  alpha  level  of  .02  or  below  (the 
.OS  protected  alpha  level).  It  is  therefore  unlikely  that 
there  are  true  sex  differences  in  performance  on  any  of 
the  tests. 

“Intelligence."  The  Shipley  score  provides  a  crude 
measure  of  intelligence,  and  these  were  significantly 
different  among  the  groups  at  the  .02  alpha  level.  The 
pilot  group  scored  higher  than  all  other  groups.  The 


depressives  (29-5)  and  the  substance  abuse  subjects  (28.7) 
were  not  different  from  each  other,  but  were  different 
from  the  neurologicals  (26.7).  In  effect,  these  results 
suggest  again  that  caution  must  be  exercised  in  interpret¬ 
ing  differences  among  experimental  groups.  In  the  final 
analysis,  many  of  the  measures  to  be  used  in  any  test 
battery  will  probably  have  to  be  moderated  with  an  age 
and  intelligence  correction  factor. 

Test  variables.  The  first  analysis  involved  performing 
independent  ANOVAs  for  the  five  experimental  groups 
(pilots,  non-pilot  normals,  depressives,  substance  abuse, 
and  neurologicals)  on  all  62  of  the  dependent  variables, 
plus  the  age  variable.  Results  of  these  analyses  are  pre¬ 
sented  in  Table  II,  and  reveal  that  42  of  the  62  variables 
(68%)  showed  F-ratios  with  probability  values  less  than 
.05.  This  number  of  significant  results  clearly  suggests 
that  the  combination  of  tests  in  the  battery  will  be  able 
to  differentiate  among  the  experimental  groups  to  a 
considerable  degree. 
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TABLE  III.  VERSION  1.1  OF  THE  NTB 


LEVEL  1 


LEVEL  2 


LEVEL  3 


TRAILS  A 
TRAILS  B 
SYMBOL  DIGIT 
TRACKING 


LOGICAL  REAS.  (%  COR.) 
DYNAMIC  MEMORY  (S.D.) 
ARITHMETIC  (ATTEMPTS) 


MEMORY  (SLOPE) 
ZUNG  DEPRESSION 
MANIFEST  ANXIETY 
DYN.  MEMORY  (R.T.) 


A  goal  of  the  first  validation  study  was  to  arrive  at  a 
second-level  battery  oftests  based  on  the  results  obtained 
among  the  pathological  and  non-pathological  groups. 
Therefore,  once  the  sensitivity  of  each  test  was  estab¬ 
lished,  the  next  step  was  to  explore  the  nature  of  these 
differences  and  to  select  the  specific  tests  and  variables 
which  would  give  the  best  diagnosticity.  As  a  start,  post- 
hoc  (Newman-Keuls)  tests  of  all  significant  variables 
were  carried  out  for  each  of  the  proposed  levels  of  the 
battery.  The  results  of  these  analyses  were  then  inspected 
to  arrive  at  a  preliminary  list  of  variables  which  appeared 
to  yield  optimum  differentiation  among  the  experimen¬ 
tal  groups.  T  hese  tests  were  then  aggregated  into  a 
revised  battery  (designated  version  !.l)  to  produce  an 
optimal  classification  of  subjects  based  on  this  sample. 
Optimization  is  appropriate  at  this  early  stage  of  test 
development,  rather  than  employing  a  split-half  or  jack¬ 
knife  procedure  to  cross-validate  the  tests  selected.  In 
view  ofthis,  it  is  obviously  inappropriate  to  overinterpret 
sophisticated  statistical  analyses. 

The  above  comparisons  among  the  individual  experi¬ 
mental  groups  revealed  that,  as  expected,  the  Level  1  tests 
were  generally  excellent  at  differentiating  between 
pathological  and  non-pathological  groups,  but  were  not 
very  discriminating  among  the  pathological  groups.  This 
is  appropriate  for  a  first-level  screening  procedure.  Fur¬ 
ther  inspection  revealed  that  one  of  the  first  level  tests 
(the  Stroop  Color  Word  Test)  was  not  contributing  as 
much  to  this  differentiation  as  the  other  three  Level  1 
tests.  For  this  reason,  it  was  decided  to  eliminate  the 
Stroop  Test  from  the  battery. 

Similar  analyses  ofeachofthe  Level  2  and  l,evel  3  tests 
originally  proposed  resulted  in  several  other  changes. 
T  he  interval  production  test  failed  to  identify  normals  or 
any  pathology  group.  On  the  other  hand,  both  the 
continuous  memory  test  and  the  verbal  thinking  test 
appeared  to  be  more  discriminating  among  pathology 
groups  than  was  originally  hypothesized.  T  hus,  they 
both  appeared  more  appropriate  for  I.cvcl  3  than  for 
L  evel  2.  In  their  place,  the  logical  thinking  and  one 
variable  from  the  memory  test  (the  standard  deviation) 
appeared  to  give  the  best  second-level  differentiation 


between  pathologicals  and  non-pathologicals,  and  these 
tests  were  therefore  moved  into  Level  2. 

In  summary,  the  revised  version  (1.1)  of  the  battery 
included  most  of  the  tests  from  the  originally  proposed 
battery,  but  made  several  changes  in  the  order  of  test 
administration.  The  tests  and  variables  included  in  this 
version  of  the  battery  are  shown  in  Table  III. 

Having  maximized  the  tests  and  variables  which  ap¬ 
pear  to  have  the  ability  to  differentiate  pathology,  it  was 
next  necessary  to  develop  the  cut-off  scores  and  decision 
logic  to  be  used  in  automating  the  screening  process.  The 
test  set  data  were  used  to  produce  a  set  of  candidate  cut¬ 
off  scores  using  multiple  criteria.  A  code  was  then 
developed  that  capitalized  on  the  differing  diagnostic 
levels  of  the  battery.  This  code,  along  with  the  scoring 
paths  used  to  generate  it,  is  shown  in  Table  IV.  It  is 
recognized,  of  course,  that  these  statements  will  be 
modified  as  a  result  of  cross-validation  studies  and  fur¬ 
ther  experience  with  the  battery.  Whatever  their  final 
form,  these  will  inform  the  clinician  of  the  level  of  the 
subject's  performance,  and  the  probable  diagnostic  im¬ 
plications  ofeach  performance  level.  Appropriately,  the 
statements  allow  the  clinician  a  considerable  degree  of 
latitude  in  determining  the  final  disposition.  They  are, 
however,  tied  rigorously  to  the  experimental  results. 

Classification  accuracy. 

Having  created  the  classification  algorithm  based  on 
the  data  from  the  present  experiment  (the  training  set), 
one  would  expect  that  the  classification  accuracy  of  this 
algorithm  will  be  optimal  and,  hopefully,  quite  high. 
This  proved  to  be  the  case  in  the  present  experiment. 
Version  1.1  of  the  NTB  successfully  identified  95  per¬ 
cent  of  the  true  positives  (5  percent  false  negative  rate). 
Further,  the  test  battery  may  do  equally  well  in  eliminat¬ 
ing  the  excessive  cost  associated  with  a  high  false  positive 
rate.  At  Level  1,  only  14  (17%)  out  of81  subjects  were 
“incorrectly”  passed  on  to  Ixrvel  2  testing.  Of  these,  only 
7  (8.6%)  failed  the  Level  2  tests  and  were  passed  on  to 
Ixrvel  3.  Of  these  seven  subjects,  one  passed  ail  the  tests, 
resulting  in  an  overall  false  positive  rate  of  7. 4%.  T  hese 
rates  (5%  false  negatives  and  7%  false  positives)  arc,  of 
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TABLE  IV.  RECOMMENDED  DIAGNOSTIC  MESSAGES  FOR  VARIOUS  LEVELS  OF 
PERFORMANCE  ON  THE  NTB 


1.  IF  SUBJECT  PASSES  ALL 
TESTS  IN  LEVEL  1 . 

This  subject  has  demonstrated  performance  on  all  tests  in  the  screening 
battery  that  is  within  the  limits  of  subjects  not  diagnosed  as  having 
neurological  insult,  affective  disorders,  or  chronic  substance  abuse 
problems. 

2.  IF  SUBJECT  FAILS  ONE 
TRACKING  TASK  AND  NO 
OTHER  TEST. 

This  subject  has  passed  all  tests  in  the  screening  battery  except  a 
demanding  test  of  visual-motor  coordination.  Many  normal  subjects  fail 
this  test.  Therefore,  if  the  subject  shows  no  clinical  signs  of  visual-motor 
abnormalities,  the  screening  battery  should  be  considered  to  have  been 
passed. 

1 

I 

3.  IF  THE  SUBJECT  FAILS  ONE 
TRACKING  TASK  AND  ANY 
ONE  OF  THE  OTHER  THREE 
TESTS  IN  LEVEL  1. 

.. 

This  subject  shows  an  overall  pattern  that  is  consistent  with  subjects  not 
having  diagnosed  neurological  insult,  affective  disorders,  or  chronic 
substance  abuse  problems.  However,  at  least  two  of  the  individual  tests 
were  failed.  While  this  failure  rate  is  not  diagnostic,  it  is  recommended 
that  increased  attention  be  given  to  clinical  signs  of  psychiatric  or 
neurological  abnormality  in  subsequent  examination.  If  no  such  signs 
are  present,  the  subject  should  be  passed. 

4.  IF  THE  SUBJECT  IS  PASSED 
ON  TO  LEVEL  2,  BUT  PASSES 
ALL  TESTS  AT  THAT  LEVEL. 

! 

This  subject  shows  an  overall  pattern  that  is  consistent  with  subjects  not 
having  diagnosed  neurological  insult,  affective  disorders,  or  chronic 
substance  abuse  problems.  However,  the  subject  has  shown  a 
performance  pattern  that  is  weak  in  one  or  more  skills.  Such  weaknesses 
have  not  usually  been  associated  with  psychiatric  or  neurological 
problems.  However,  increased  attention  should  be  given  to  clinical 
signs  of  such  problems  in  subsequent  examination.  In  the  absence  of 
such  signs,  the  subject  should  be  passed. 

5.  IF  THE  SUBJECT  IS  PASSED 
j  ON  TO  LEVEL  2  AND  FAILS 

ANY  TESTS  AT  THAT  LEVEL, 

BUT  THEN  PASSES  ALL  TESTS 

S  AT  LEVEL  3. 

i 

This  subject  should  be  screened  carefully  for  neurological  or  psychiatric 
problems.  The  test  battery  suggests  that  the  individual  has  a  performance 
or  skill  deficit  that  is  shared  by  many  individuals  with  such  problems. 
However,  there  is  no  specific  indication  of  such  problems  in  the 
responses  of  the  sutnrr  Therefore,  if  clinical  examination  is  totally  t 

negative,  the  individ  .  .t  uld  be  passed  -  otherwise,  the  subject 
should  be  referred. 

1 

:  6.  IF  THE  SUBJECT  IS  PASSED 
TO  LEVEL  3,  AND  FAILS  ONE 
i  OR  MORE  TESTS  AT  THAT 

LEVEL  (EXCEPTING  THE 
SPECIFIC  CASES  NOTED 
BELOW). 

This  subject  shows  a  pattern  of  performance  that  has  been  demonstrated 
by  individuals  diagnosed  as  having  psychiatric,  neurological,  or 
substance  abuse  problems.  Further  testing  is  therefore  strongly  indicated. 
It  is  recommended  that  this  individual  be  given  a  more  intensive 
neurological  and  psychiatric  screening  and,  if  indicated,  that  further  j 

referral  for  specialized  testing  be  made. 

7.  IF  THE  SUBJECT  IS  PASSED 
ON  TO  LEVEL  2,  AND  THEN 
FAILS:  DECISION  AND 
MEMORY  SD  SET  2,  AND 
!  DECISION  AND  MEMORY 
SLOPE  IN  LEVEL  3. 

1 

j 

This  subject  shows  a  pattern  of  performance  that  has  been  seen  in 
several  individuals  diagnosed  as  having  depressive  disorders.  While 
there  are  many  other  possible  explanations  for  this  pattern,  it  is 
recommended  that  increased  screening  for  psychiatric  disturbance  1 

should  be  carried  out  on  this  individual.  ' 

j 

8.  IF  THE  SUBJECT  IS  PASSED 
,  ON  TO  LEVEL  2,  AND  THEN 
FAILS:  DYNAMIC  MEMORY 
ALONE  (NO  OTHER  LEVEL  3) 

This  subject  shows  a  pattern  of  performance  that  has  been  seen  in  some  1 
individuals  diagnosed  as  having  substance  abuse  problems.  There  are 
many  other  possible  explanations  for  this  pattern,  and  the  data  on  this  j 
relationship  are  tentative.  Therefore,  while  it  is  recommended  that 
increased  screening  for  substance  abuse  should  be  carried  out  on  this 

individual,  a  negative  clinical  finding  should  be  considered  definitive. 

. 
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course,  extremely  good.  If  maintained,  they  would  make 
the  NTB  an  extremely  successful  screening  test. 

Results  of  the  second  study. 

Given  the  encouraging  results  from  the  first  study 
reported  above,  a  second-generation  test  battery  (version 
2.0)  was  created.  Essentially,  the  tests  determined  in 
version  1 . 1  above  were  all  re-programmed  to  operate  on 
an  IBM  XT  or  higher  (or  true  clone).  This  involved 
creating  computer  versions  of  the  Trails  and  Symbol¬ 
digit  tests,  re-programming  the  tracking  task  to  operate 
with  a  joystick,  and  incorporating  the  scoring  criteria 
into  the  computer  so  that  evaluation  was  done  automati¬ 
cally. 

It  is  recognized,  of  course,  that  the  new  version  of  the 
battery  will  require  different  norms,  and  may  even  have 
a  different  sensitivity  to  the  pathological  subjects  than 
the  older  version.  Therefore,  as  described  above,  20  of 
the  subjects  who  had  participated  in  the  above  study 
were  re-tested  with  the  new  version  in  order  to  get  some 
idea  of  the  relationship  between  the  two  different  imple¬ 
mentations. 

Analyses  of  variance  comparing  scores  on  versions  1 . 1 
and  2.0  were  carried  out  to  determine  which  scores 
differed  significantly.  These  analyses  revealed  that  1 1  out 
of  the  32  scores  were  indeed  different  between  the  two 
test  administrations.  Of  these  differences,  5  were  on  tests 
that  were  significantly  different  in  format  between  the 
two  test  administrations.  Essentially,  the  data  indicate 
that  the  following  tests  are  much  harder  in  version  2.0 
than  in  version  1.1:  Trails  A,  Trails  B,  Symbol  Digit 
percent  correct,  and  tracking  losses.  Logical  reasoning 
reaction  time  was  faster  on  version  2.0  than  on  version 
1.1.  Mathematical  processing  also  appeared  easier  on 
version  2.0  for  all  variables,  except  that  subjects  got  fewer 
correct. 

This  number  of  differences  between  the  two  versions 
raises  the  possibility  that  the  original  preliminary  valida¬ 
tion  of  version  1 . 1  may  be  negated.  Thus,  the  degree  to 
which  version  2.0  was  able  to  discriminate  between 
pathological  and  non-pathological  groups  is  also  of 
prime  interest.  Results  of  these  analyses  are  presented  in 
Table  V.  The  five  test  groups  included  pilots,  non-pilot 
normals,  substance  abusers,  neurologically  impaired  sub¬ 
jects,  and  depressives.  A  total  of  1 9  out  of  the  32  variables 
(39%)  significantly  discriminated  between  the  patho¬ 
logical  ard  non-pathological  subjects  using  version  2.0 
of  the  test  battery.  This  compares  to  a  total  of  08% 
significant  differences  for  version  1.1  of  the  battery. 
Specifically,  it  is  seen  that,  of  the  10  tests  that  originally 
discriminated  between  groups  in  version  1.1,  7  (70%) 
also  significantly  discriminated  in  version  2.0.  Further, 


12  tests  that  were  not  significant  in  version  1.1  were 
significant  in  version  2.0.  Thus,  the  basic  validation  of 
version  1.1  remains  defensible  for  version  2.0.  If  any¬ 
thing,  it  appears  that  version  2.0  might  be  even  more 
sensitive  to  differences  among  the  various  experimental 
groups. 

DISCUSSION 

The  Neuropsychological  Test  Battery  (NTB)  de¬ 
scribed  above  appears  to  have  excellent  potential  to 
answer  the  need  for  a  computerized  test  of  cognitive 
function  that  codd  serve  as  an  adjunct  to  the  routine 
physical  examination.  Its  major  strengths  lie  in  the 
theory-based  approach  and  in  the  use  of  a  step  procedure. 
The  former  offers  the  potential  for  extensive  testing  of  all 
domains  of  higher  cognitive  function,  while  the  latter 
provides  a  time-  and  cost-efficient  screening  at  increas¬ 
ingly  more  diagnostic  levels.  The  results  of  the  initial 
validation  studies  are  encouraging.  Clearly,  the  tests 
selected  discriminate  among  differing  groups  of  normal 
and  pathological  subjects.  Obviously,  with  further  evo¬ 
lution  of  the  battery,  additional  precision  and  efficiency 
can  be  added  to  the  battery  as  it  presently  stands. 

The  NTB  is  still  in  the  early  stages  of  development. 
Although  many  tests  are  implemented  clinically  on  the 
basis  of  less  evidence,  the  very  nature  of  the  theory-based 
approach  demands  that  far  more  study  be  carried  out  on 
the  NTB  before  it  can  be  validated  for  clinical  imple¬ 
mentation.  Required  studies  fall  into  three  general  catego¬ 
ries:  1)  further  criterion  validation  and  cross-validation,  2) 
exploration  of  additional  and  alternative  tests  and  proce¬ 
dures  for  the  battery,  and  3)  human  factors  issues  related  to 
actual  clinical  implementation. 

The  first  type  of  study  is  in  many  ways  the  most 
critical.  The  present  studies,  while  establishing  the  valid¬ 
ity  of  the  basic  concepts,  barely  scratch  the  surface. 
Cross-validating  the  tests  and  scoring  criteria  is  an  obvi¬ 
ous  first  step.  It  would  be  expected  that  this  will  reveal 
somewhat  less  accurate  prediction  than  was  obtained  in 
the  training  samples.  Re-adjustment  of  criteria,  re-defi¬ 
nition  ofthe  interpretative  statements,  and  perhaps  even 
elimination  of  tests  that  do  not  cross-validate  may  be 
necessary  to  further  refine  the  battery.  This  will  interact 
with  the  second  series  of  studies,  in  which  continuing 
developments  in  the  field  of  cognitive  science  must  be 
monitored  for  identification  of  new  resources  and/or 
tests.  It  can  certainly  not  be  claimed  that  the  multiple 
resources  theory  is  complete  in  describing  the  entire 
domain  of  cognitive  function.  Thus,  the  NTB  must  be 
viewed  as  an  evolving  series  of  specific  probes. 
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TABLE  V.  COMPARISON  OF  THE  DIAGNOSTIC  SENSITIVITY  OF  VERSIONS  1.1  AND  2.0  OF 
THE  NTB  AMONG  THE  TEST  GROUPS 


SIGNIFICANCE  LEVELS 


TEST 

VARIABLE 

VERSION  1.1 

VERSION  2.0 

TRAILS  A 

R.T. 

.02 

.005 

TRAILS  B 

R.T. 

.006 

.008 

SYMBOL-DIGIT 

%  CORRECT 

.42 

.86 

R.  T. 

.005 

.019 

TRACKING 

ERROR 

.019 

.108 

LOGICAL  REASONING 

LOSSES 

.001 

.213 

R.  T. 

.436 

.012 

S.  D. 

.215 

.028 

%  CORRECT 

.065 

.0005 

ARITHMETIC 

R.  T. 

.073 

.005 

S.  D. 

.029 

.018 

#  ATTEMPT 

.046 

.029 

#  CORRECT 

.104 

.015 

%  CORRECT 

.801 

.017 

STERNBERG 

SET  1  R.T. 

.792 

.129 

SET  2  R.T. 

.411 

.025 

SET  4  R.T. 

.357 

.007 

SET  1  S.D. 

.873 

.145 

SET  2  S.D. 

.466 

.026 

SET  4  S.D. 

.500 

.002 

SET  1  %  COR 

.680 

.002 

SET  2  %  COR 

.260 

.926 

SET  4  %  COR 

.703 

.404 

OVERALL  R.T. 

.528 

.016 

OVERALL  % 

.790 

.779 

SLOPE 

.082 

.405 

INTERCEPT 

.749 

.438 

ZUNG  DEPRESSION 

.0002 

.0003 

MANIFEST  ANXIETY 

.027 

.0005 

DYNAMIC  MEMORY 

R.  T. 

.013 

.747 

S.  D. 

.189 

.879 

%  CORRECT 

.088 

.420 

In  addition,  procedural  or  software  changes  might  be 
incorporated,  which  could  increase  the  diagnosticity  of 
the  battery.  For  example,  the  battery  could  keep  track  of 
each  individual’s  scores  over  time,  and  automatically 
apply  curve-fitting  techniques  to  discern  atypical  pat¬ 
terns  of  change,  which  might  provide  early  detection  of 
a  variety  of  conditions  involving  slow  cognitive  deterio¬ 
ration. 

Finally,  the  third  series  of  required  studies  involves 
making  the  battery  appropriate  for  general  clinical  use. 
The  effect  of  subject  intelligence,  age,  sex,  reading  abil¬ 
ity,  motivation,  etc.,  must  be  explored  in  more  detail 
than  has  been  done  thus  far.  Instructions  must  be  made 
understandable  for  any  type  of  individual,  and  the  entire 


battery  must  be  human  engineered  so  that  it  becomes  a 
pleasant  and  self-motivating  experience  for  everyone. 

In  spite  of  the  above  needs  and  the  difficulty  of  the 
task  ahead,  it  is  important  not  to  lose  sight  ofthe  fact  that 
a  new  type  of  routine  clinical  testing  is  embodied  by  this 
battery.  Automated  behavioral  assessment  at  this  level  of 
theoretical  sophistication  has  not  been  generally  intro¬ 
duced  into  the  routine  physical  examination.  As  noted 
by  the  AMA,  current  neurological  screening  appears 
increasingly  inadequate  in  assessing  the  higher-level  cog¬ 
nitive  functions  of  interest  in  today’s  occupational  envi¬ 
ronment  (AMA,  1984). 
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The  present  test  development  suggests  that  it  may 
well  be  possible  to  transfer  previously  expensive  and 
complex  diagnostic  approaches  to  the  screening  battery, 
without  sacrificing  time  or  precision,  by  taking  advan¬ 
tage  of  computerized  testing  and  decision  processes.  In 
this  sense,  the  NTB  may  be  the  precursor  of  many  new 
test  approaches  which  will  be  routinely  used  by  examin¬ 
ers. 

CONCLUSIONS 

Based  on  the  results  of  rhe  two  studies  reported  here, 
the  following  conclusions  appear  justified: 

1.  Age,  sex,  and  intelligence  level  appear  to  exert 
moderator  effects  on  the  tests  proposed  for  the  bat¬ 
tery,  and  therefore  must  be  taken  into  account  in  any 
future  implementations. 

2.  Computerized,  performance-based  tests  arc  ca¬ 
pable  of  achieving  remarkable  degrees  of  screening 
and  diagnostic  accuracy  between  normals  and  certain 
groups  of  subjects  with  diagnosed  pathology. 

3.  A  step  approach  to  screening  provides  a  time- 
and  cost-  efficient  method  ofscrcening  individuals  for 
neurological,  psychiatric,  or  substance  abuse  prob¬ 
lems. 

4.  Maximum  interpretative  efficiency  in  thescreen- 
ing  procedure  can  be  achieved  through  the  use  of  a 
theory-based  battery  of  tests  which  probes  the  re¬ 
sources  relevant  to  a  particular  real-world  job  or  task. 

RECOMMENDATIONS 

In  its  present  form,  the  NTB  is  recommended  for  use 
by  experienced  professionals  in  an  experimental  mode, 
with  appropriate  confirmatory  testing  in  all  cases.  Under 
these  conditions,  the  NTB  can  provide  objective  backup 
data  for  clinical  decisions. 

Recognizing  that  the  present  studies  establish  only  the 
basic  proof-of-concept  for  this  approach,  a  series  of 
increasingly  specific  and  definitive  studies  should  be 
carried  out  to  permit  the  NTB  to  evolve  into  a  stand¬ 
alone  battery  capable  of  being  used  in  the  examiner’s 
office.  Several  major  types  of  study  are  recommended: 

1 .  Cross-validation  studies  should  be  carried  out  to 
establish  the  actual  predictive  accuracy  of  present  and 
revised  scoring  algorithms.  Interpretative  statements 
should  be  refined  in  view  of  these  studies  and  clinical 
experience. 


2.  Changes  and  additions  to  the  basic  battery  should 
be  made  as  the  field  of  cognitive  science  progresses. 
Specifically,  candidate  tests  which  probe  additional 
human  resources  should  be  studied  in  order  to  expand 
the  applicability  of  the  NTB  to  additional  occupa¬ 
tional  categories. 

3.  In  addition  to  the  inclusion  of  new  tests, 
opportunities  to  improve  the  NTB  through  advanced 
mathematical  analysis  of  results  should  be  explored. 
These  include  techniques  for  monitoring  a  client’s 
performance  over  time,  and  enhanced  discriminative 
analyses. 

4.  The  battery  should  be  further  human  engineered 
in  such  a  way  that  it  can  be  self-administered  and 
automatically  scored.  Ultimately,  feedback  and 
appropriate  follow-up  recommendations  should  be 
provided  directly  to  the  client. 
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